Skip to main content
Qubrid AI
DeepSeek ยท Chat / LLM ยท 70B Parameters ยท 64K ContextQubrid Playground License HuggingFaceStreaming Reasoning Chain-of-Thought Code Long Context Chat

Overview

DeepSeek R1 Distill LLaMA 70B is a knowledge-distilled reasoning model built on the LLaMA-3.1-70B architecture, trained on high-quality reasoning outputs from DeepSeek R1. It delivers near frontier-level analytical performance while running on significantly smaller hardware than the full R1 model โ€” making it ideal for teams that need powerful chain-of-thought reasoning without the infrastructure overhead of a 671B parameter system. Served instantly via the Qubrid AI Serverless API.
๐Ÿง  Frontier reasoning. Distilled efficiency. Run DeepSeek R1 intelligence on Qubrid AI โ€” no GPUs, no setup, no ops.

Model Specifications

FieldDetails
Model IDdeepseek-ai/deepseek-r1-distill-llama-70b
ProviderDeepSeek
KindChat / LLM
ArchitectureLLaMA-3.1-70B (Distilled)
Parameters70B
Context Length64,000 Tokens
MoENo
Release DateJanuary 2025
LicenseDeepSeek R1 License (MIT)
Training DataDistilled from DeepSeek R1 high-quality reasoning outputs with Llama 70B
Function CallingNot Supported
Image SupportN/A
Serverless APIAvailable
Fine-tuningComing Soon
On-demandComing Soon
State๐ŸŸข Ready

Pricing

๐Ÿ’ณ Access via the Qubrid AI Serverless API with pay-per-token pricing. No infrastructure management required.
Token TypePrice per 1M Tokens
Input Tokens$1.20
Output Tokens$1.80

Quickstart

Prerequisites

  1. Create a free account at platform.qubrid.com
  2. Generate your API key from the API Keys section
  3. Replace QUBRID_API_KEY in the code below with your actual key

Python

from openai import OpenAI

# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
  base_url="https://platform.qubrid.com/v1",
  api_key="QUBRID_API_KEY",
)

# Create a streaming chat completion
stream = client.chat.completions.create(
  model="deepseek-ai/deepseek-r1-distill-llama-70b",
  messages=[
    {
      "role": "user",
      "content": "Explain quantum computing in simple terms"
    }
  ],
  max_tokens=10000,
  temperature=0.3,
  top_p=1,
  stream=True
)

# If stream = False comment this out
for chunk in stream:
  if chunk.choices and chunk.choices[0].delta.content:
      print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")

# If stream = True comment this out
print(stream.choices[0].message.content)

JavaScript

import OpenAI from "openai";

// Initialize the OpenAI client with Qubrid base URL
const client = new OpenAI({
  baseURL: "https://platform.qubrid.com/v1",
  apiKey: "QUBRID_API_KEY",
});

// Create a streaming chat completion
const stream = await client.chat.completions.create({
  model: "deepseek-ai/deepseek-r1-distill-llama-70b",
  messages: [
    {
      role: "user",
      content: "Explain quantum computing in simple terms",
    },
  ],
  max_tokens: 10000,
  temperature: 0.3,
  top_p: 1,
  stream: true,
});

// If stream = false comment this out
for await (const chunk of stream) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}
console.log("\n");

// If stream = true comment this out
console.log(stream.choices[0].message.content);

Go

package main

import (
	"bufio"
	"bytes"
	"encoding/json"
	"fmt"
	"net/http"
)

func main() {
	url := "https://platform.qubrid.com/v1/chat/completions"

	data := map[string]interface{}{
		"model": "deepseek-ai/deepseek-r1-distill-llama-70b",
		"messages": []map[string]string{
			{
				"role":    "user",
				"content": "Explain quantum computing in simple terms",
			},
		},
		"temperature": 0.3,
		"max_tokens":  10000,
		"stream":      true,
		"top_p":       1,
	}

	jsonData, _ := json.Marshal(data)
	req, _ := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
	req.Header.Set("Authorization", "Bearer QUBRID_API_KEY")
	req.Header.Set("Content-Type", "application/json")

	client := &http.Client{}
	res, _ := client.Do(req)
	defer res.Body.Close()

	scanner := bufio.NewScanner(res.Body)
	for scanner.Scan() {
		line := scanner.Text()
		if line != "" {
			fmt.Println(line)
		}
	}
}

cURL

curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "deepseek-ai/deepseek-r1-distill-llama-70b",
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum computing in simple terms"
    }
  ],
  "temperature": 0.3,
  "max_tokens": 10000,
  "stream": true,
  "top_p": 1
}'

Live Example

Prompt: Explain quantum computing in simple terms
Response:
<think>
The user wants a simple, accessible explanation of quantum computing.
I'll cover the core concepts โ€” bits vs qubits, superposition,
entanglement โ€” and ground it in practical applications.
</think>

Quantum computing, simplified:

Classical computers process information as bits โ€” each is either 0 or 1.

Quantum computers use qubits, which exploit quantum mechanics to exist
as 0, 1, or both simultaneously (superposition). This allows them to
evaluate enormous numbers of possibilities in parallel.

Two foundational principles:

  1. Superposition  โ€” a qubit can represent multiple states at once
                      until it is measured
  2. Entanglement   โ€” qubits can be correlated so that the state of
                      one instantly informs the state of another,
                      regardless of distance

Why does this matter?

Quantum computers aren't faster at everything โ€” they're exponentially
more powerful for specific problem types:

  - Simulating molecular and chemical interactions (drug discovery)
  - Breaking and building cryptographic systems
  - Optimization across logistics, finance, and scheduling
  - Accelerating certain machine learning workloads

The practical reality: we're still in the early era of quantum computing.
Today's machines are noisy and error-prone. But progress is accelerating,
and the implications for science and industry are profound.
Try it yourself in the Qubrid AI Playground โ†’

Playground Features

The Qubrid AI Playground lets you interact with DeepSeek R1 Distill LLaMA 70B directly in your browser โ€” no setup, no code, no cost to explore.

๐Ÿง  System Prompt

Shape the modelโ€™s reasoning approach, output format, and domain focus before the conversation begins โ€” ideal for technical assistants, structured analysis pipelines, and multi-turn problem-solving workflows.
Example: "You are a highly precise technical analyst. Think through every
problem step by step before answering. Show your reasoning clearly,
and flag any assumptions you make along the way."
Set your system prompt once in the Qubrid Playground and it applies across every turn of the conversation.

๐ŸŽฏ Few-Shot Examples

Guide the modelโ€™s reasoning depth and output structure with concrete examples โ€” no fine-tuning, no retraining required.
User InputAssistant Response
What is the time complexity of merge sort?Merge sort has O(n log n) time complexity in all cases โ€” best, average, and worst. This is because the array is divided log n times and each division requires O(n) work to merge.
Solve: if 3x + 7 = 22, what is x?Step 1: Subtract 7 from both sides โ†’ 3x = 15. Step 2: Divide by 3 โ†’ x = 5.
๐Ÿ’ก Stack multiple few-shot examples in the Qubrid Playground to shape reasoning style and output format โ€” no fine-tuning required.

Inference Parameters

ParameterTypeDefaultDescription
StreamingbooleantrueEnable streaming responses for real-time output
Temperaturenumber0.3Controls creativity and randomness. Higher values produce more diverse output
Max Tokensnumber10000Defines the maximum number of tokens the model is allowed to generate
Top Pnumber1Nucleus sampling: limits token selection to a subset of top probability mass
Reasoning EffortselectmediumAdjusts the depth of reasoning and problem-solving effort. Higher settings yield more thorough responses at the cost of latency
Reasoning SummaryselectautoControls verbosity of reasoning explanations. auto lets the model decide; concise gives brief summaries; detailed offers in-depth explanations

Use Cases

  1. Advanced reasoning and problem solving
  2. Conversational AI
  3. Technical and coding assistance
  4. Long-form text generation
  5. Math and logic tasks
  6. Research and analysis

Strengths & Limitations

StrengthsLimitations
Excellent reasoning and chain-of-thought capabilitySlightly slower than smaller distilled models
Lower GPU memory requirement compared to the full R1 modelReasoning quality may vary in very complex tasks
Strong performance across technical and multilingual tasksFunction calling not supported
Open-source and suitable for on-prem deployment

Why Qubrid AI?

  • ๐Ÿš€ No infrastructure setup โ€” serverless API, pay only for what you use
  • ๐Ÿ” OpenAI-compatible โ€” drop-in replacement using the same SDK, just swap the base URL
  • ๐Ÿง  Reasoning at scale โ€” distilled R1 intelligence served with Qubridโ€™s low-latency infrastructure
  • ๐Ÿงช Built-in Playground โ€” prototype with system prompts and few-shot examples instantly at platform.qubrid.com
  • ๐Ÿ“Š Full observability โ€” API logs and usage tracking built into the Qubrid dashboard
  • ๐ŸŒ Multi-language support โ€” Python, JavaScript, Go, cURL out of the box

Resources

ResourceLink
๐Ÿ“– Qubrid Docsdocs.platform.qubrid.com
๐ŸŽฎ PlaygroundTry DeepSeek R1 Distill LLaMA 70B live
๐Ÿ”‘ API KeysGet your API Key
๐Ÿค— Hugging Facedeepseek-ai/deepseek-r1-distill-llama-70b
๐Ÿ’ฌ DiscordJoin the Qubrid Community

Built with โค๏ธ by Qubrid AI

Frontier models. Serverless infrastructure. Zero friction.